Goto

Collaborating Authors

 planning & scheduling



ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning

Neural Information Processing Systems

The ability to actively acquire information is essential for open-world planning under partial observability and incomplete knowledge. However, most existing embodied AI systems either assume a known object category or rely on passive perception strategies that exhaustively gather object and relational information from the environment. Such a strategy becomes insufficient in visually complex open-world settings. For instance, a typical household may contain thousands of novel and uniquely configured objects, most of which are irrelevant to the agent's current task. Consequently, open-world agents must be capable of actively identifying and prioritizing task-relevant objects to enable efficient and goal-directed knowledge acquisition. In this work, we introduce ACTIVEVOO, a novel zero-shot framework for open-world embodied planning that emphasizes object-centric active knowledge acquisition. ACTIVEVOO employs lifted regression to generate compact, first-order subgoal descriptions that identify task-relevant objects, and provides a principled mechanism to quantify the utility of sensing actions based on commonsense priors derived from LLMs and VLMs. We evaluate ACTIVEVOO on the visual ALFWorld benchmark, where it achieves substantial improvements over existing LLMand VLM-based planning approaches, notably outperforming VLMs fine-tuned on ALFWorld data. This work establishes a principled foundation for developing embodied agents capable of actively and efficiently acquiring knowledge to plan and act in open-world environments.


Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLMPlanning with Multifaceted Constraints

Neural Information Processing Systems

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences.


Non-Clairvoyant Scheduling with Progress Bars

Neural Information Processing Systems

In non-clairvoyant scheduling, the goal is to minimize the total job completion time without prior knowledge of individual job processing times. This classical online optimization problem has recently gained attention through the framework of learning-augmented algorithms. We introduce a natural setting in which the scheduler receives continuous feedback in the form of progress bars--estimates of the fraction of each job completed over time. We design new algorithms for both adversarial and stochastic progress bars and prove strong competitive bounds. Our results in the adversarial case surprisingly induce improved guarantees for learning-augmented scheduling with job size predictions. We also introduce a general method for combining scheduling algorithms, yielding further insights in scheduling with predictions. Finally, we propose a stochastic model of progress bars as a more optimistic alternative to conventional worst-case models, and present an asymptotically optimal scheduling algorithm in this setting.


Learning from Demonstrations via Capability-Aware Goal Sampling

Neural Information Processing Systems

Despite its promise, imitation learning often fails in long-horizon environments where perfect replication of demonstrations is unrealistic and small errors can accumulate catastrophically. We introduce Cago (Capability-Aware Goal Sampling), a novel learning-from-demonstrations method that mitigates the brittle dependence on expert trajectories for direct imitation. Unlike prior methods that rely on demonstrations only for policy initialization or reward shaping, Cago dynamically tracks the agent's competence along expert trajectories and uses this signal to select intermediate steps--goals that are just beyond the agent's current reach--to guide learning. This results in an adaptive curriculum that enables steady progress toward solving the full task. Empirical results demonstrate that Cago significantly improves sample efficiency and final performance across a range of sparse-reward, goal-conditioned tasks, consistently outperforming existing learning from-demonstrations baselines.


Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling

Neural Information Processing Systems

The rise of smart manufacturing under Industry 4.0 introduces mass customization and dynamic production, demanding more advanced and flexible scheduling techniques. The flexible job-shop scheduling problem (FJSP) has attracted significant attention due to its complex constraints and strong alignment with real-world production scenarios. Current deep reinforcement learning (DRL)-based approaches to FJSP predominantly employ constructive methods. While effective, they often fall short of reaching (near-)optimal solutions. In contrast, improvement-based methods iteratively explore the neighborhood of initial solutions and are more effective in approaching optimality.


REMI: Reconstructing Episodic Memory During Internally Driven Path Planning

Neural Information Processing Systems

Grid cells fire in triangular grid patterns, while place cells fire at specific locations and respond to contextual cues. How do these interacting systems support not only spatial encoding but also internally driven path planning, such as navigating to locations recalled from cues? Here, we propose a system-level theory of MEC-HC wiring that explains how grid and place cell patterns could be connected to enable cue-triggered goal retrieval, path planning, and reconstruction of sensory experience along planned routes. We suggest that place cells autoassociate sensory inputs with grid cell patterns, allowing sensory cues to trigger recall of goal-location grid patterns. We show analytically that grid-based planning permits shortcuts through unvisited locations and generalizes local transitions to long-range paths. During planning, intermediate grid states trigger place cell pattern completion, reconstructing sensory experiences along the route. Using a single-layer RNN modeling the HC-MEC loop with a planning subnetwork, we demonstrate these effects in both biologically grounded navigation simulations using RatatouGym and visually realistic navigation tasks using Habitat Sim. Codes for experiments, simulations, and vision encoder are available at 1,2,3.


6f4bb3e0b6331df4b85337c3403c7490-Paper-Conference.pdf

Neural Information Processing Systems

Human behavior is characterized by continuous learning to reduce uncertainties about the world in pursuit of goals. When trying to understand such behavior from observations, it is essential to account for this adaptive nature and reason about the uncertainties that may have led to seemingly suboptimal decisions. Nevertheless, most inverse approaches to sequential decision-making focus on inferring cost functions underlying stationary behavior or are limited to low-dimensional tasks. In this paper, we address this gap by considering the problem of inferring an agent's knowledge or awareness about the environment based on a given trajectory. We assume that the agent aims to reach a goal in an environment they only partially know, and integrates new information into their plan as they act. We propose a Bayesian approach to infer their latent knowledge state, leveraging an approximate navigation model that optimistically incorporates partial information while accounting for uncertainty. By combining sample-based Bayesian inference with dynamic graph algorithms, we achieve an efficient method for computing posterior beliefs about the agent's knowledge. Empirical validation using simulated behavioral data and human data from an online experiment demonstrates that our model effectively captures human navigation under uncertainty and reveals interpretable insights into their environmental knowledge.


Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Neural Information Processing Systems

In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose Collective Monte Carlo Tree Search (CoMCTS), a new learning-to-reason method for MLLMs, which introduces the concept of collective learning into "tree search" for effective and efficient reasoning-path searching and learning. The core idea of CoMCTS is to leverage collective knowledge from multiple models to collaboratively conjecture, search and identify effective reasoning paths toward correct answers via four iterative operations including Expansion, Simulation and Error Positioning, Backpropagation, and Selection. Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question. With Mulberry-260k, we perform collective SFT to train our model, Mulberry, a series of MLLMs with o1-like step-by-step Reasoning and Reflection capabilities. Extensive experiments demonstrate the superiority of our proposed methods on various benchmarks.


Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

Neural Information Processing Systems

Neural rendering techniques, including NeRF and Gaussian Splatting (GS), rely on photometric consistency to produce high-quality reconstructions. However, in real-world driving scenarios, it is challenging to guarantee perfect photometric consistency in acquired images. Appearance codes have been widely used to address this issue, but their modeling capability is limited, as a single code is applied to the entire image. Recently, the bilateral grid was introduced to perform pixel-wise color mapping, but it is difficult to optimize and constrain effectively. In this paper, we propose a novel multi-scale bilateral grid that unifies appearance codes and bilateral grids. We demonstrate that this approach significantly improves geometric accuracy in dynamic, decoupled autonomous driving scene reconstruction, outperforming both appearance codes and bilateral grids. This is crucial for autonomous driving, where accurate geometry is important for obstacle avoidance and control. Our method shows strong results across four datasets: Waymo, NuScenes, Argoverse, and PandaSet. We further demonstrate that the improvement in geometry is driven by the multi-scale bilateral grid, which effectively reduces floaters caused by photometric inconsistency.